SGI Developer Toolbox 6.1

home *** CD-ROM | disk | FTP | other *** search

/ SGI Developer Toolbox 6.1 / SGI Developer Toolbox 6.1 - Disc 1.iso / toolbox / documents / audio / audio.apps / dev / fp.underflow next >

Wrap

Text File | 1996-11-11 | 8.4 KB | 200 lines

[Chris Pirazzi provides the following useful information (thanks, Chris!) -Doug] From cpirazzi@cp.esd.sgi.com Mon Jul 24 22:57:19 1995 Received: from cp.esd.sgi.com by candiru.engr.sgi.com via ESMTP (940816.SGI.8.6.9/911001.SGI) for <cook@candiru.engr.sgi.com> id WAA26935; Mon, 24 Jul 1995 22:57:19 -0700 Received: by cp.esd.sgi.com (940816.SGI.8.6.9/940406.SGI.AUTO) for cook id WAA05094; Mon, 24 Jul 1995 22:57:18 -0700 Date: Mon, 24 Jul 1995 22:57:18 -0700 From: cpirazzi@cp.esd.sgi.com (Chris Pirazzi) Message-Id: <199507250557.WAA05094@cp.esd.sgi.com> To: cook@cp.esd.sgi.com Subject: floating point underflow exceptions Status: OR you might want to include some info from here on your signal processing faq. this has saved me several times. this is the final version of my faq item ============================================================================ well, it has taken me several months, but I have finally tracked down all the bugs and needed details concerning floating point underflow exception handling. this information should help people to identify a condition that could speed up their signal processing code by hundreds of times. ------------------------------ FAQ item for comp.sys.sgi.misc and comp.sys.sgi.audio FAQ's (I guess it should probably be in .misc, and there should be a pointer question in .audio. It is definitely NOT just an audio thing, but audio people really want to know this.) ------------------------------ Subject: -XX- why does my floating point signal processing routine, when given certain inputs, run incredibly slowly and consume all of the CPU in _system_ or _interrupt_ time ? Date: Mon Jul 24 22:28:13 PDT 1995 You may be experiencing an undesirable "floating point underflow" behavior of the floating point unit on R3k's and beyond. Roughly, a floating point underflow (defined in IEEE standard 754) occurs when a floating point operation creates a non-zero number whose absolute value is so small that it would cause other exceptions in subsequent operations. When an underflow occurs that is not somehow masked, the FPU causes an interrupt (R3k) or trap (R4k and later) on the CPU, the CPU runs some code to handle the trap, and then the floating point instruction in your program completes. This happens once for each underflowing instruction. Recursive filters will often generate large numbers of underflows in large spans, and so they often clearly reveal the slow processing of these exceptions. Code can easily run hundreds or thousands of times more slowly if it underflows on every operation. - How Do I Tell if I'm Getting Lots of Floating Point Underflows? On R3k machines, programs that are performing lots of operations which underflow will eat up the CPU in "intr" time (the yellow part of the CPU bar on gr_osview). On R4k and later machines, such programs will eat up the CPU in "system" time (the red part of the CPU bar on gr_osview), since trap handling counts towards "system" time. You can check for these underflows (and other exceptions) more reliably by temporarily linking your program with -lfpe, setting the environment variable TRAP_FPE to 'DEBUG;ALL=COUNT(1)', and running your program. This will print something every time any floating point exception occurs. You probably want to change the 1 to some large number so that you can see just how many FPE's occur without waiting for thousands of printfs. - How Do I Fix It? There are two methods: 1. Link your program with -lfpe, and execute the following code snippet in your program once, before your signal processing code: #include <sigfpe.h> /* set underflowing values to zero (_ZERO), but in particular, also set the special "flush zero" bit (FS, bit 24) in the Control Status register. This bit exists in R4k and later processors. This special bit will cause the FPU not to generate an exception for floating point underflows, and quietly substitute zero instead. On R3k CPUs, this setting will be treated just like "_ZERO." */ sigfpe_[_UNDERFL].repls = _FLUSH_ZERO; handle_sigfpes(_ON, _EN_UNDERFL, NULL, _ABORT_ON_ERROR, NULL); 2. (works on R4000 and later processors ONLY) execute the following code snippet in your program once, before your signal processing code (linking with libfpe is neither required nor recommended for method 2): #include <sys/fpu.h> /* set the special "flush zero" but (FS, bit 24) in the Control Status Register of the FPU of R4k and beyond so that the result of any underflowing operation will be clamped to zero, and no exception of any kind will be generated on the CPU. This has no effect on an R3000. */ void flush_all_underflows_to_zero() { union fpc_csr f; f.fc_word = get_fpc_csr(); f.fc_struct.flush = 1; set_fpc_csr(f.fc_word); } Method 2 is highly recommended and preferred for development of any code that does not need to execute on R3000 CPUs. See below for why. Note that any code compiled -mips2 or higher already has this restriction built-in, and so should use method 2. In general, it makes sense for all signal processing code to include one of these code snippets for better performance, since it doesn't hurt, and since underflows often come up unexpectedly. - What Is Going On? Note that on R4k's and above, method 1 ends up performing exactly the same Control Status Register operation as method 2, plus a lot of other unnecessary stuff. Method 1 requires the application to link with libfpe. This library was designed for use in trapping and debugging floating point exceptions, not silencing them. For example, any app that links with libfpe must deal with several subtle and nasty side effects relating to signal handling. This is why we strongly recommend the use of method 2 whenever possible. But, as we mentioned, method 1 is the only solution available for R3k machines. On R3k's, the code snippet above causes the interrupt handler for the underflow exception to clamp the underflowing value to zero. This code does NOT prevent the FPU from issuing future underflow interrupts (these interrupts cannot be disabled on the R3k), but it does severely decrease the likelihood that you will run into serious performance degradations due to underflow. This is because the underflows in a typical recursive filter come in large spans of several thousand underflows that occur before the accumulated value finally reaches zero. This libfpe setting "catches" and clamps such underflow spans at the moment that they begin. Note that we have used the constant _FLUSH_ZERO instead of _ZERO so that this snippet also solves the underflow problems on R4ks and beyond. On R3ks, _FLUSH_ZERO and _ZERO are equivalent. On R4k and later FPUs, method 1 and method 2 both set a bit on the FPU which prevents the FPU from issuing any trap on underflow. The FPU quietly substitutes zero for the result of underflowing operations. Therefore, this setting is even more effective than it is on R3k's. It may still be less efficient than an algorithm which never underflows, though. On processors later than R4k (R8k in fast mode, for example), this behavior may be the default, so you may never see the problem. KNOWN BUGS: In IRIX 5.3, two regressions occurred that did not exist in earlier IRIX releases. R4600 CPUs: the libfpe library (and thus method 1) does not work at all on R4600s. Any attempt to set libfpe options will result in a message about "unknown CPU type." A patch is in the works for this. Contact your SGI service representative about possible patches for internal bug number 275803 or for libfpe. You can also use method 2 to get around this bug. R3000 CPUs: the libfpe library (and thus method 1) does not work at all on R3000s. Any attempt to set libfpe options will result in a message about "cause bits" and an abort (core dump). A patch is in the works for this rather serious regression. Contact your SGI service representative about possible patches for internal bug number 276012. This bug is a kernel bug and requires a kernel patch. Programs that attempt to intercept SIGFPE directly (ie, not via libfpe) are also affected by this bug. NEW INFO: as of this writing, the fix for this R3000 bug made it into the R3000 kernel patch number 676, which has not yet been released. By the time you read this, there may be another higher-numbered R3000 kernel patch that includes the fixes of patch 676 and other fixes too. Contact your SGI service representative to be sure.